Introduction

Structure

Welcome to my portfolio for Computational Musicology!

For this portfolio I analyzed two songs using a variety of different metrics. These metrics helped me understand how my songs were structured, including aspects such as timbre and tempo. By applying computational methods, I was able to extract meaningful insights and identify patterns within the songs. The analysis provided a deeper understanding of how musical elements come together to form the unique sound of each track.

For this project I used an AI-tool to generate the tunes, namely JenAI, the idea of these two prompts for the tunes were specified by using ChatGPT (I made two prompts myself and let ChatGPT generate even more specific prompts)

Song #1

For ‘wietske-b-1.mp3’ I used: Create a modern indie-pop track with a warm, intimate vibe, blending organic acoustic elements with subtle electronic textures. The song should feature delicate yet expressive string arrangements (such as a small string ensemble or chamber-style strings) that add depth and emotion without overpowering the core melody. The instrumentation should include gentle guitar or piano, soft percussion, and atmospheric pads or synths to enhance the dreamy, introspective feel. The track should be between 2 to 4 minutes long, suitable for a public broadcaster.

And this is what it sounds like:

Song #2

For ‘wietske-b-2.mp3’ I used: Create a high-energy pop-rock track infused with modern synth elements. The song should feature driving drums, a tight bassline, and rhythmic electric guitars with a mix of clean and overdriven tones. Synths should add depth with lush pads, arpeggiated sequences, and subtle electronic effects. The track should feel anthemic and uplifting, with a dynamic build and a powerful, memorable chorus. 2-4 minutes.

And this is what it sounds like:

Structure

For this portfolio I will walk you through the analysis of these two songs. We will first focus on the analysis of just the two songs. From timbre, to harmony and then to rhythm. This reflects how we perceive music - first the overall sound color, then the harmonic structure, and finally the rhythm. And finally, compare the songs to the entire class corpus to analyze how they are positioned within the dataset in terms of musical features.

Timbre

Visual Song #1

Cepstrogram Song #1

Timbre-based SSM Song #1

Visuals Song #2

Cepstrogram Song #2

Timbre-based SSM Song #2

Description

Cepstrograms

The cepstrograms help us see how the ‘sound color’ of a song changes. Dark blue means a quiet sound feature, and yellow means a loud one. The bottom of the diagram shows the basic tones, and the top shows the finer details like brightness and texture.

In the first song’s diagram, the colors stay mostly yellow and orange, meaning the sound color stays pretty steady. There are no big jumps from dark to bright, so the song doesn’t have any sudden loud parts or dramatic changes. The bottom of the picture is also bright, showing that the main tones are strong. The top is mostly the same color, meaning the sound texture is stable. This matches what we hear: the instruments play steadily, and the sound color doesn’t change much.

The second song’s diagram is a bit different. We see a little more activity at the top, around number 3, but the strongest activity is still around number 1. This means the song has some small sound changes, but nothing too big. It also suggests we’re hearing two main instruments with different sound colors. When we listen, we can clearly hear the electric guitar and drums, which could correspond to the activity around the two coefficients. These instruments create the main sound and feel of this song.

SSM

Looking at the self-similarity matrix (SSM) for the first song, we can see clear ‘block-like’ patterns. These patterns tell us that there are sections of the song where the overall sound, or timbre, stays very consistent. For example, there’s a large block that starts around 60 seconds and lasts until about 120 seconds. If we listen to the song, we can hear a change around 50-60 seconds, and then the sound remains quite homogenous for the next minute. We also notice a similar block from about 130 to 170 seconds.

Interestingly, apart from the main diagonal line (which just shows that every moment is identical to itself), there are no clear diagonal lines. This suggests that while the song might feel repetitive, it doesn’t have repeating patterns of timbre. Instead, it has sections where the overall ‘sound color’ remains the same. The predominantly dark tones in the SSM indicate that, on a timbre level, the song is fairly consistent throughout.

For the second song the SSM is a bit harder to analyze, as there is no classic block-like structure nor a classic path-like structure present in the diagram. There seems to be a ‘build-up’ in similarity concerning the timbre in the song, at the start there seems to be a lot more dissimilar parts, then in the second part. In the second part of the song there seems to be a block starting at around 75 seconds until the end, which contains very similar parts concerning the timbre of the song, with only one ‘stripe’ of dissimilarity crossing it vertically (and horizontally of course).

Conclusion

Concluding, there are quite a few differences between the two songs, if you’re looking at the timbre of the songs. While the timbre of the first song seems to stay stable at coefficient = 1, the timbre of the second song seems to include two distinct sound colors, probably caused by the two main instruments you hear; electric guitar and drums. While the timbre of the first song seems to be more homogenous at certain time stamps in the song, the second song seems to become more and more similar while the song progresses.

Harmony & Pitch

Chromagrams

Chromagram Song #1

Chromagram Song #2

Chordograms

Song #1 Chordogram

Song #2 Chordogram

Description

So now we’re moving on to the harmony of the songs. The harmony of these two songs are shown using the chromagrams and chordograms. When comparing the chromagrams you can spot a few differences. For the first song you can see the pitch classes around F# / Gb are mostly light colored, which indicates a presence of these classes. In the last part the lower pitch classes like A and B become more present. Overall there is a lot of variation in pitch classes within this song. This also counts for the second song, where the start is mostly dominated by the lowest pitch class B, but this presence declines, a bit and other pitch classes become more present, such as C, D and F#/Gb. The presence of the lower pitch class in the second song can be derived from the fact that mostly electric guitar and drums are present, which produce lower pitches like B.

The chordograms are not as intuitive as the chromagrams, it is harder to gain insight in what is actually shown. For both I used the cosine distance and the chord templates. The lighter parts are the chords played, as you can see for both songs it seems there are a lot of different chords played at the same time, which is very confusing. In this case it is therefore better to focus on the chromagrams, when analyzing the harmony of both songs.

Harmony & Pitch 2

Chroma-based Self Similarity Matrices

Song #1 Chroma-based Self Similarity Matrix

<ggproto object: Class CoordFixed, CoordCartesian, Coord, gg>
    aspect: function
    backtransform_range: function
    clip: on
    default: FALSE
    distance: function
    expand: TRUE
    is_free: function
    is_linear: function
    labels: function
    limits: list
    modify_scales: function
    range: function
    ratio: 1
    render_axis_h: function
    render_axis_v: function
    render_bg: function
    render_fg: function
    setup_data: function
    setup_layout: function
    setup_panel_guides: function
    setup_panel_params: function
    setup_params: function
    train_panel_guides: function
    transform: function
    super:  <ggproto object: Class CoordFixed, CoordCartesian, Coord, gg>

Song #2 Chroma-based Self Similarity Matrix

<ggproto object: Class CoordFixed, CoordCartesian, Coord, gg>
    aspect: function
    backtransform_range: function
    clip: on
    default: FALSE
    distance: function
    expand: TRUE
    is_free: function
    is_linear: function
    labels: function
    limits: list
    modify_scales: function
    range: function
    ratio: 1
    render_axis_h: function
    render_axis_v: function
    render_bg: function
    render_fg: function
    setup_data: function
    setup_layout: function
    setup_panel_guides: function
    setup_panel_params: function
    setup_params: function
    train_panel_guides: function
    transform: function
    super:  <ggproto object: Class CoordFixed, CoordCartesian, Coord, gg>

Description

Just like we looked at the timbre-based SSMs, we can also look at SSMs based on chroma, which tells us about the chords and pitches in the songs. In both songs, we see ‘block-like’ patterns, meaning there are sections where the chords and pitches stay pretty much the same. However, the size of these blocks is different.

In the first song, the blocks are quite long, about 10-20 seconds. This means that for those periods, the chords and pitches are very homogenous But in the second song, the blocks are shorter, usually no more than 5-10 seconds. This tells us that the chords and pitches change more often in the second song.

So, we can say that the first song has longer stretches where the chords and pitches are stable, while the second song has more frequent changes. This fits with what we saw in the chromagrams, where the first song also showed more stable chroma than the second song.

NB: I am aware that there is code behind the diagrams, but I can’t figure out how to get rid of it.

Novelty functions

Visual Song #1

Energy-based Song #1

Spectral-Based Song #1

Visual Song #2

Energy-based Song #2

Spectral-based Song #2

Description

We have used two different types of novelty functions. Namely, energy-based novelty functions and spectral-based novelty functions. The energy-based novelty function is better suited for songs with a strong onset/beat. Spectral-based novelty is more suitable for songs with a more melodic sound, and a less present beat. If we look at the first energy-based function it shows a few peaks, but not as much as it does with the second song, where there seems to be a bit more of a stable beat. The spectral-based function shows a more interesting diagram for the first song, compared to the energy-based function. This can be explained by the fact that this song is more melodic, and does not have such a strong beat, which makes the spectral-based function more fitting and interesting. On the other hand, for the second song is the energy-based function a better fit. The drums cause a strong beat which is more fitting for the energy-based function. For example, it tells us there is a big onset around 150 seconds, which is clearly audible in the song.

Rhythm and Tempo

Visuals Song cyclic is True

Tempogram Song #1

Tempogram Song #2

Description

Now onto the tempo and rhythm of my songs. These are the tempograms for the two songs. It shows an alternating tempo of around 140 BPM. Although the song does sound a bit repetitive, and this was also supported by both SSM’s, the tempogram does not really show this. This alternating tempo is probably caused by the fact that it is, as metioned before, more of a melodic song than a song with a strict tempo.

On the other hand, the second song does seem to have quite a strict tempo, which is a bit higher than the other one, it’s about 150 BPM. This can be derived, again, from the drums which produce a strong beat.

Therefore we can conclude that a tempogram might not be as informative nor relevant for the first song as it is for the second song.

Class Corpus AI Analysis: Heat Map

HeatMap

Description

Now that we have analyzed both songs deeply, we are moving on to the last part of this portfolio: the class corpus. We are gonna take a look at how my songs fit into the class corpus. First up: The Heat Map. This heat map gives insight in where my songs are placed compared to the other songs in the corpus concerning the valence, arousal, danceability, instrumentalness and tempo. To look at the heat map properly, you need to zoom in a bit.

For the first song, the valence and arousal show quite low values around -0.8. While the danceability (0.4), instrumentalness (0.3) and tempo (1.3) show increasingly higher values.

For the second song, it shows a valence, instrumentalness and arousal value of around 0. Danceability of around 1.6, and tempo shows a relatively low value of around -1.1.

From this we can conclude that the first song has high valence and arousal, while the second song has a medium valence and arousal. The instrumentalness of the first song is medium, just like the second song. The danceability for the first song is also medium, and low for the second song. And the tempo is relatively low for the first song and high for the second (as we already knew).

Class Corpus AI Analysis: Classification

          Truth
Prediction AI Non-AI
    AI     30     18
    Non-AI 19     23

# A tibble: 2 × 3
  class  precision recall
  <fct>      <dbl>  <dbl>
1 AI         0.625  0.612
2 Non-AI     0.548  0.561

Description

We also built a classification model to using the class corpus, namely a KNN-model. It is classifier that predicts whether a song in the class corpus is AI-generated or not. Looking at the precision and recall of the model we can conclude that the AI class has higher precision and recall than the Non-AI class, suggesting that the model is better at predicting AI-generated songs. However, there’s still room for improvement in both precision and recall for both classes, since none of the values are particularly high.

Class Corpus AI Analysis: Random Forest

# A tibble: 2 × 3
  class  precision recall
  <fct>      <dbl>  <dbl>
1 AI         0.654  0.694
2 Non-AI     0.605  0.561

Description

On to the last part; the Random Forest. First we created a feature importance diagram, where arousal was apparently the most important feature. We used this for the Random Forest and the scatter plot shows the result. From the scatter plot you can’t really draw any straight forward conclusions, as the yellow and purple dots don’t seem to cluster together as clearly as we would have liked to see. Bigger dots represent higher tempo, and even this feature can’t be used to draw any hard conclusions. Therefore this random forest does not bring us any new information, like we hoped it would.

Conclusion

During this project I analyzed two of my own AI-generated songs. These songs where two very different songs, and this also resulted from the analyses I performed.

From the Self-Similarity Matrices to the Tempograms, almost every diagram showed the difference between these two, in all kinds of ways. It showed that the first song has more of a melodic tune with more homogenous sound of longer duration, aswell for timbre as for chroma. The melodic tune of the song is also supported by the spectral-based novelty function, which was a much better fit than the energy-based function. Also the pitch seemed to be a bit higher compared to the second song, which can be derived from the chromagrams.

For the second song it showed that there was a strong beat, produced by the drums most likely, in the tempogram and in the energy-based novelty function, which came out as a better fit for this song. The pitch seemed to be a bit lower overall compared to the first song, following the chromagram. Also the presence of two main instruments, namely the drums and the electric guitar made the cepstrogram stand out from the one from the first song.

Overall this project helped me gain an insight in the differences between songs, and on how many levels this can be analyzed and justified, with the support of all these different types of diagrams. It is truly mindblowing!